Parrotpark: Why and How to self-host LLMs

Jonas Stettner | CorrelAid @ CDL

2025-05-07

Agenda

  1. Why self-hosting?
  2. How to self-host? (Comparison of options)
  3. Introduction to Parrotpark
  4. Demonstration
  5. Disucssion

Why to self-host: Use of proprietary LLM applications

  • Dependence, lack of transparency and little control:
    • Data processing (GDPR)
    • Resource consumption
    • Properties and training of the models
    • Model and tool usage/configuration; e.g. web search (🗲 GUI apps such as GPT Builder)

Alternative: Self-Hosting

  • Chat interface and API bridge trivial to self-host - ✅ Model and tool usage/configuration
  • LLM inference:
    • Azure OpenAI on EU servers - ✅ GDPR
    • Open models - ✅ More transparent model
      • API services hosted in the EU
      • Dedicated GPU server - ✅ Fully transparent resource consumption (only inference)

Dedicated vs API: Costs for EU Provider Scaleway

  • Claude 4 Opus on Open Router: $15/M input tokens; $75/M output tokens
  • GPT-4o on Open Router: $2.50/M input tokens; $10/M output tokens

Dedicated vs API: Costs for EU Provider Scaleway

  • How much VRAM can we afford?: L4 with 24GB, limits model choices + context window size

Dedicated vs API: Example Pricing Calculation

  • Scaleways allows automated GPU instance creation (unlike Hetzner), so we deploy only during working hours
    • \(\text{Cost} = \text{€}0.75 \times (10\,\text{h} \times 5\,\text{days} \times 4\,\text{weeks}) = \text{€}150\)
    • Including tax (19%): €178.50
  • Mistral Small 3.2 24B via OpenRouter (assuming 50/50 input/output split):
    • €178.50 = $210.27 (at €1 = $1.178)
    • \(\frac{\text{\$}105.14}{0.05} + \frac{\text{\$}105.14}{0.10}\) = 3,154M tokens
    • Per working day: \(\frac{3{,}154}{20} = 158\,\text{M tokens/day}\)

Dedicated vs API: Why Dedicated?

  • Maximum control and transparency
  • More predictable/fixed costs
  • One can fit more stuff on the GPU server
    • Embedding and reranking models
    • Frontend and api bridge
  • Exact metrics on hardware and inference server level

What is Parrotpark?🦜

  • Infrastructure project: Self-hosting of LLMs, LLM APIs and frontends
  • Defined conditions
    • Servers in the EU with as much GPU as NPOs could afford (20GB)
    • Hosting of open models
    • Output in German language
  • Research and development: How far can you get under the conditions, which applications can be implemented?

High Level Overview

]

Implementation

  • IaC (Infrastructure as Code): Replicable infrastructure through declaration as code
  • Quantization of smaller models
  • Use of existing open source projects
    • LLM inference server
    • Frontends for interaction with LLMs
  • So far no fine-tuning, but instead prompt optimization for certain applications

Evaulation

  • Time window: June 17th to June 27th (9 working days)
  • Scraped Metrics: http://mtbs.correlaid.org/public/dashboard/6032e4e9-e87a-49d7-bd67-f0d92552cc1c
  • User Survey

Evaluation: Tokens and Pricing

  • Total processed tokens: 329,503 input / 103,083 output
  • ❌ API service for the same model would have cost waaaaay less: $0.027 vs ~(€178.5/2)=€89

Demonstration

  1. GUI
  2. Messung des Stromverbrauchs